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Abstract — This work applies the methods of signal processing 
and the concepts of control system design to model the mainte- 
nance and modulation of reading frame in the process of protein 
synthesis. The model shows how translational speed can modulate 
translational accuracy to accomplish programmed +1 frameshifts 
and could have implications for the regulation of translational 
efficiency. 

A series of free energy estimates were calculated from the 
ribosome's interaction with mRNA sequences during the process 
of translation elongation in eubacteria. A sinusoidal pattern of 
roughly constant phase was detected in these free energy signals. 
Signal phase was identified as a useful parameter for locating 
programmed +1 frameshifts encoded in bacterial genes for release 
factor 2. A displacement model was developed that captures the 
mechanism of frameshift based on the information content of 
the signal parameters and the relative abundance of tRNA in the 
bacterial cell. Results are presented using experimentally verified 
frameshift genes across eubacteria. 

A set of MATLAB® programs that implement our methods 
are available upon request from the corresponding author. 



I. Introduction 

In electrical devices, input signals control device states. If 
the translating ribosome followed this design, its reading frame 
states. Frame 0, Frame +1 and Frame +2 (or -1), would be 
controlled by an input signal. In electrical devices, control 
system design takes the form of a mathematical model of 
a control system algorithm which decodes input signals to 
1 determine device state. The analytical tools of signal process- 
ing provide methods for detecting signals, extracting them 
from noise, characterizing signal parameters, and identifying 
the parameters and parameter behaviors that are predictive of 
device states. To use these tools requires a mathematical model 
of the machine and an algorithm that simulates the machine 
process. 

Our previous work [1] has shown that a free energy signal 
containing a periodic component of frequency f — 1/3 can 
be extracted for each mRNA of a specific eubacterium. Signal 
extraction is done using an algorithm that creates succes- 
sive alignments of the bacterium's 16S rRNA 3'-terminal 
nucleotide tail with the mRNA sequence. For each sequence 
alignment, a free energy of hybridization is calculated, the 
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value of which is a function of the degree of complementarity. 
This algorithm simulates scanning of the mRNA by the 16S 
rRNA tail, as suggested by Weiss et al [2]. 

Our hypothesis is that the free energy signal arising from 
hybridization of the 16S rRNA tail with the mRNA is the 
input signal that controls reading frame. Modulation of reading 
frame could be accomplished through this signal if it supplied 
a force that adjusted the position of the mRNA relative 
to the ribosome. The first step towards validation of this 
hypothesis is the development of a mathematical model that 
defines ribosome position as a function of free energy signal 
parameters. The second step involves experimental testing of 
model predictions. This paper presents the development of the 
mathematical model describing control system design. 

II. Signal characterization and extraction 

Our previous work [ 1 ] has shown that the free energy signal 
contains a periodic / = 1/3 component embedded in noise. 
A suitable model for the free energy signal is 

y7i — + Asin ^27r-?i + 0^ + z„ , n ~ . . . {L — 1) (1) 

where L is the number of nucleotides in the mRNA sequence, 
and Zn is additive IID noise with mean and variance a^. 
Estimates of signal amplitude A and phase (j> were obtained 
using a regression procedure. We found that genes belonging 
to a specific organism had a roughly constant phase (j> in 
their free energy signals and that the mean phase angle of all 
genes in the species i9sp) varied linearly with species (Gh-C) 
content [1]. However, the statistical eiTor associated with these 
estimates was large. 

The free energy signal is noisy, resulting in a low signal- 
to-noise ratio (SNR). The signal periodicity of three nu- 
cleotides can be used to improve the signal to noise ratio. 
The noise component of the signal can be reduced by calcu- 
lating nucleotide-based averages of free energy triplets. This 
approach wiU result in the SNR growing linearly with the 
number of codons. 

A. Method of accumulation 

A hypothetical memory for the ribosome system can be 
created consisting of a stack of 3 registers. The memory 
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system maintains updates of the free energy released due to 
the interaction between the 16S rRNA tail and the mRNA 
sequence. As the energy values accumulate in the memory 
registers, information pertaining to the reading frame gets 
updated. 

We denote the register contents by the vector R'^'^' , k ~ 
I . . . ^, where ^ is the number of codons in an mRNA 
sequence. We store the first three energy values (computed 
from alignments of the 16S rRNA tail with the first 3 bases 
of the mRNA sequence, i.e. the first codon) in consecutive 
registers i.e. 

yo 



r(1) 



yi 
y2 



We then accumulate, or update, the free energies from 
the first codon by adding to them the free energy values 
corresponding to the second codon position, resulting in 



R 



(2) _ 



yo + ys 
yi + y4 
y2 + ys 



After accumulating the signal for a length of k codons, the 
register contents will be 



R 



(fe) _ 



fe-i 



n=0 

ri=0 
fc-1 

yZn+2 

This procedure is repeated until the last mRNA codon is 
reached, i.e., until fc = ■§■■ 

B. Cumulative magnitude and phase 

The register contents R'^'^-' represent a snapshot of the free 
energy signal pattern. The three points have a sinusoidal nature 
due to the dominant periodicity of the energy pattern. This 
allows us to calculate the cumulative magnitude Mk and phase 
9k by interpolation. As a result, R''^^ can be represented as a 
phasor M^e^^'' [3]. We equate the contents of the registers, 
after subtracting their mean, to points on a sine-wave and solve 
Equations Q, (O and Q for Mk and 9k- 
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Mksin ( ^fc + (4) 



C. Signal-to-Noise Ratio 

Based on our free energy signal model (Equation ([T])), we 
have 

/ fe-l \ 3fe-l 

) = (kA) sin (0) + I ^ I - 3 E 



= {kA) sin 




^ 3fe-l 



3=0 



rf = {kA) sin[^ + <A + \Y. z,,+2 - E 



Therefore, 



and 



Mk = kA 



Ji I '^^ \ „2 
Oi_.= \ — \a 



where is the noise variance of the contents of the memory 
register R'*^-*. The SNR of the register contents is given by 

Mi 3k ( A 



2al 2 V2tT2 

Thus, the accumulation of points corresponding to the same 
sinusoidal pattern causes the SNR to grow linearly with the 
number of codons. 

D. Visualization using polar plots 

The magnitude Mk and phase 9k of the register contents 
can be visualized on a polar plot, with the radial coordinate 
representing magnitude and the angular coordinate represent- 
ing phase. Because the free energy signal frequency equals 1/3 
cycles/nucleotide, each 120° sector of the polar plot represents 
one nucleotide (see Figure[T]). For the free energy signal to play 
a role in reading frame determination, it would be expected 
that variation in Mk and/or 9k would correlate with shifts 
in reading frame. To determine if such a correlation might 
exist, two genes were selected: aceF, a gene which does not 
encode a frameshift, and prfB, a well-studied gene whose 
mRNA sequence is known to encode a programmed frameshift 
at codon 26 [4]. 

Although the polar plot for aceF (Figure |2]i shows some 
variation, the cumulative phase stays roughly constant at 
about -15°, within the sector of one nucleotide. Similar phase 
constancy was observed in all the 1673 verified genes in E. 
coli of length 200 codons or greater [5]. However, considerable 
variation in track within the nucleotide sector can occur (see 
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Fig. 1 . Thick lines indicate phase boundaries for each reading frame, relative 
to an initial signal phase of -20° 




Fig. 2. Polar plot for gene aceF in E. coli 

Figure [3]). By comparison, the polar plots of prfB (Figures 
m and ID are quite different. The plot starts in the same 
nucleotide sector as that for aceF, but around codon 26 it 
swings through approximately 240°. When the phase change 
is complete, the plot re-establishes itself within a different 
nucleotide sector and remains there, with small variation, to 
the end of the gene. Although provocative and consistent 
with our hypothesis, analysis of other genes known to encode 
frameshifts would strengthen the correlation. 

RECOD^ is a database of non-canonical translational 

'http://recode.genetics.utah.edu/ 




Fig. 3. Polar plot for gene tsf in E. coli 



Fig. 4. Partial polar plot for gene prfB in E. coli: arrow points to the location 
of frameshift, marked by a * 




Fig. 5. Polar plot for gene pi-jB in E. coli 

events such as frameshifts, ribosomal hops and codon redefi- 
nition [6] [7]. Experimentally verified prfB gene sequences for 
twelve prokaryotes other than E. coli were obtained and their 
free energy signals were calculated using the corresponding 
species' 16S tail, and signal parameters were generated using 
the cumulative method. The prfB polar plots for all the 
examined species are shown in the Appendix. A significant 
phase change is observed around the frameshift location in all 
these genes, consistent with the results obtained using the prfB 
gene in E. coli. 

E. Drawbacks 

Our cumulative model of signal phase, although useful for 
revealing frameshift sites encoded in gene sequences, has one 
significant drawback. For every additional codon, a greater 
perturbation of the free energy signal will be needed to 
shift the cumulative phase. This means that the model will 
have difficulty identifying frameshifts if they occur towards 
the end of a long gene sequence. Also, there is no experi- 
mental evidence that indicates that the entire gene sequence 
upstream of a frameshift site has a controlling influence on 
the frameshift. The sequence elements that result in a shift 
in reading frame during translation are small and can be 
localized in a short sequence within the coding region [4]. 
To accommodate these concerns we developed a new model 
that estimates instantaneous signal phase at each codon. 
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III. Displacement Model 
A. Calculation of displacement 

For a gene without a frameshift, the polar plot would 
lengthen itself radially (due to growth in magnitude) but stay 
at a roughly constant phase angle {6k « Osp)- When a +1 
frameshift happens, the phase moves to a new nucleotide 
sector, +240° or -120° away. From the prfB polar plot, we see 
that the phase shifts about 60° before it gets to the frameshift 
location (from approximately -20° to approximately +40°), the 
equivalent of one-half of a nucleotide. Then it begins its track 
at the angle that reestablishes it in the new nucleotide sector, 
+240° from where it originated. We designate x = as the 
initial state, i.e., reading frame 0, as one of the two stable states 
of the ribosome-mRNA system. We assign unit increments in 
X for every 60° increment in phase, i.e. for every ^ nucleotide- 
shift in the mRNA sequence. If the ribosome shifts a whole 
nucleotide, as it does in the +1 frameshift, we have a; = 2. So a 
+1 frameshift can be modeled as a state transition from x = Q 
to X = 2. The intermediate value x = 1 can be thought of as 
a boundary point, where there is equal likelihood of picking 
either the codon in Frame or the codon in Frame +1. 

As stated earlier, the cumulative energy signal, owing to its 
sinusoidal nature, can be represented as = M^e^^'' . We 
will refer to V/j as the cumulative vector. The contents of 
contain a summation of the entire free energy signal up to 
codon k. The derivative of with respect to codon position 
k gives the instantaneous energy available at codon k. 



dMk 



(8) 



' dk dk 
The magnitude and phase of the differential vector D^, 
referred to as differential magnitude and differential phase, 
are given by Equation (|9]l and Equation ( fTOb respectively. 
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dd^ 
dk 



ZDk = 9k + arctan 



dMk 
dk 



(9) 



(10) 



To calculate \T)k\ and ZD^, we will need the derivatives 
'^^^ which can be evaluated using function ap- 

proximation techniques [8]. A second order polynomial can be 
fitted to a window of points centered around Mk, to evaluate 
its derivative, An identical procedure is followed for 

computing 

We observe that for a signal that stays roughly in phase, 
^ w 0, and so, |Dfe| « ^ and ZDk « ^fe. We know, 
from previous work that the free energy signals in a given 
eubacterium have a roughly constant phase [1]. For E. coli, that 
angle is 6sp ~ —20°. For a normal, non-frameshifting gene of 



length L nucleotides in E. coli, we see that dk Osp as 
fc ^ -J. Within the context of our hypothesis, the differential 
vector T>k represents a force acting on the ribosome at codon k 
that adjusts the position of the ribosome relative to the mRNA, 
i.e., that modulates reading frame. 

Another element believed to play an integral part in pro- 
grammed frameshifts is ribosomal pausing [4]. Sipley and 



Codon 


Amino-acid 


Number of wait-cycles 


aac 


Asn 


7 


ecu 


Pro 


16 


acg 


Thr 


13 


cuu 


Leu 


13 


uuc 


Phe 


7 


gca 


Ala 


2 



TABLE I 

Wait-times for a few sample codons in E. coli 



Goldman [9] provide experimental evidence that supports a 
frameshift model in which ribosomal pause time is a ma- 
jor determinant of frameshift probability, with pause time a 
function of tRNA availability. Therefore, we introduce the 
concept of wait-time, a measure of how long the ribosome 
waits for the tRNA to associate with the ribosome A-site, into 
our displacement model. 

B. Estimating wait-time 

The actual availability of tRNA, estimated using two- 
dimensional polyacrylamide gel electrophoresis, was found to 
be proportional to codon frequency for moderately expressed 
genes [10]. Using a set of mRNA sequences in E. coli that 
have N codons in all, the frequency of each codon (except 
the stop codons) can be calculated as 

/i = ^, i = 1...61 (11) 

where Ni is the number of codons of type i. If a particular 
tRNA recognizes only one codon, then the codon frequency 
would be indicative of its availability. If there is more than 
one codon recognized by a tRNA isoacceptor, then the avail- 
ability of that isoacceptor will be the sum of the individual 
codon frequencies. We estimate the availability of each tRNA 
isoacceptor using 



7p 



E 

i=l 



fi, P = 1 ■ 



.20 



(12) 



where Up is the number of codons that code for amino acid 
P- 

Codons having abundant tRNAs would have short wait- 
times, and vice-versa. We assume a decreasing linear relation- 
ship between the wait-time t and the tRNA availability 7, as 
shown in Equation (fTsl l. The wait-time gives an approximate 
number of cycles for which the ribosome can adjust itself 
while waiting for the appropriate tRNA. The number of wait 
cycles for a few sample codons are shown in Table J] 



max(7) - 7p 
min(7) 



(13) 



C. The complete model 

The vector represents a force that could produce a 
linear movement of the ribsome one way or the other until 
the corresponding tRNA is found for the codon in the A- 
site. The displacement at each codon position is calculated 
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Fig. 6. Vector field generated by Equation jl4t 



incrementally (Ax), with the sign of Ax indicating the di- 
rection of movement (+ = downstream, - = upstream). The 
total displacement Xk is obtained by accumulating Ax for the 
corresponding number of wait cycles. When the ribosome is 
in reading frame 0, we define x = and when it moves into 
the +1 frame, we define x — 2. We claim that the following 
equation captures the behavior in both reading frame states: 

Axfc = -C|Dfc|sin(zDfc + ^-^?,p) (14) 

The argument of the sine function contains the instantaneous 
measurement of phase: 



TTXfc 

3 



-'sp 



(15) 



0, the cumulative phase is at the 
dsv, leading to Ax — 0. When 



Observe that when x 
species angle i.e., ZDk 
X — 2, we have ZDj. = Ogp + again leading to Ax = 0. 
To calculate Ax, we introduce a constant of proportionality 
C, and calibrate it using the prfB signal. Mathematically, 
C measures the rate at which the ribosome adjusts itself to 
perturbations in x. For each unit of wait-time (also referred 
to as a wait-cycle), the incremental displacement Axj, gets 
added onto the current position x^.. The total displacement is 
then assigned to the next codon fc+ 1. Note that we are using 
the superscript j to index increments made during the wait- 
time of the ribosome. If the ribosome waits for r cycles at 
codon k, the total initial displacement at codon k + 1 would 
be assigned as 



EM 

3=1 



(16) 



D. Stability 

In practice, all the above equations hold approximately, so 
it is important to establish stability of the ribosome-mRNA 
system in a rigorous manner [11]. Equation ( fT4l i can be written 
as a recursive relation 



i+i _ j 



ClDJsin ZD, 



(17) 



1 ) Stability of x* — 0: When the ribosome is in reading 
and ZDfe = 9sp- Substituting xj, ~ into 



frame 0, x^ 



Equation ( [TT] ) leads to x^^^ ~ xj^, and hence, x* = is a 
fixed point. Let r/j = x^ — x* be a small perturbation away 
from X*. To see whether the perturbation grows or decays, 
we substitute x|. ~ rjj + x* into Equation ( flTl i. The recursive 
relation can now be written as 



-V3+1 



-ClDfclsin ZD 



7r(x* +r]j) 



Substituting x* = 0, we get 



77,+i=7?,-C|D,|sin(^) 



(18) 



Since rjj is small, we have 
Vj+i ~rij-C |Dfe 



By making C fairly small, it can be ensured that ^C^^^^^^ < 
1 Vfc. This implies that rjj decays to zero as j gets large, 
since ~ ^\^k\ ^ ^ Thus, small perturbations cause the 
displacement to converge to the fixed point x* = 0. The idea 
is illustrated in Figure |6l 

2) Stability of x* = 2: When the ribosome is in reading 
frame H-1, x^ = 2 and ZD/j = 9sp + Substituting these 
into Equation ( flTl l yields x^^^ = 
point. For a nearby point .x], = x 
takes the form 



so X* = 2 is a fixed 
the recursive relation 



-rij+i = x*+f]j-C |Dfc| sin ^ZD^ + 



7r(x* +r]j) 



Substituting x* — 2, we get an equation identical to Equation 
( fTSl l. Following identical steps, we may establish the stability 
of the fixed point x* 2. 

The above arguments have established that the Equations 
( fT4l l and ( fTST i are structured so that the states x = and x = 2 
represent stable fixed points of the ribosome-mRNA system. 
Transition between the states is governed by the differential 
vector Dfc and the time r for which the ribsome waits at codon 
k. 

IV. Results 

Two model parameters, the species phase angle, 9sp, and the 
constant, C, must be specified to generate displacement values. 
The species phase angle Ogp is the mean phase angle estimated 
from the set of verified genes as annotated in genbankS 
using the method described in [1]. For E. coli, the estimated 
value is dgp — —13°. For gene prJB in E. coli, the value 
of C = 0.005 gave the highest resolution of a jump in 
displacement at codon 26. These values of Ogp and C were 
used for subsequent analyses of other genes in E. coli. The 
values of these parameters for other bacteria are listed in the 
Appendix. At the first codon of a gene sequence, the ribosome 
is locked into Frame 0, so we use xi = 0. The stop codons 
are assigned a large number of wait-cycles, typically 1000. 



- http ://www. ncbi . nlm . nih.gov/Genbank/ 
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Fig. 7. Displacement plot for gene aceF in E. coli 




Fig. 8. Displacement plot for gene prfB in E. coli 



The displacement plots for the aceF and prfB genes of E. 
coli are given in Figures |7] and |8] respectively. Several features 
of these plots are of note. The displacement plot for aceF 
(Figure |7]l, a gene lacking a frameshift, shows that x k, Q 
for the entire length of the coding region. This behavior of 
X indicates that our method does not detect a frameshift in 
this gene, the expected result. In contrast, the displacement 
plot for the prfB gene (Figure |8]i shows a sudden shift in x at 
codon 26, the absolute value of which is slightly greater than 
2 and it is in the positive direction. Our algorithm is scaled 
such that a displacement value of a; = 2 indicates a shift of 
one nucleotide, so in this case, the displacement indicates a 
+1 nucleotide shift in reading frame. This is also an expected 
result given that codon 26 is the location of a +1 frameshift 
in the prfB gene. For the remainder of the sequence, i.e., from 
codon 27 to the end of the gene, the value of x remains roughly 
at X = 2. This indicates that the gene stays in the new reading 
frame. The prfB displacement plots for the remaining bacteria 
that we analyzed are given in the Appendix. 

Link et a I [12] assessed the in vivo abundances of proteins in 
E. coli using electrphoresis, and ranked the genes in decreasing 
order of yield. We calculated the free energy signals for 87 
such genes in E. coli, and analyzed them using our model. 
We found that for 86 of these genes, —1 < Xk < 1 for all 
values of k, indicating that the ribosome stays in frame for the 
entire length of each sequence. For the one remaining gene, 
we found sUght deviation from the boundary value of Xfe = 1 



at fc = 70, indicating a low probability of picking the in-frame 
codon at that location. The polar plots and displacement plots 
for 10 of these genes are included in the Appendix. 

V. Discussion 

Our previous work defined an algorithm that simulates 
possible hybridization between the 3 '-terminal nucleotides of 
the 16S rRNA and the mRNA. The algorithm revealed a 
periodic, free energy signal in the coding regions of the genes 
in a number of bacterial species [1]. Based on the ideas of 
Weiss et al [2], Trifonov [13] and others, we hypothesized that 
this free energy signal could be supplying the information to 
modulate reading frame. 

Using the free energy signal we developed a mathematical 
model optimized to precisely predict the codon location of the 
frameshift site within the prJB coding sequence. The model 
is an adaptive algorithm that estimates the displacement of 
the ribosome from its original reading frame (Frame 0). This 
algorithm enables us to track the state of the ribosome-mRNA 
system. The physical interpretation of the differential vector, 
Dfc, in the model is that it represents the amount of force 
available at codon k to adjust the position of the mRNA. The 
amount of this adjustment potential that is actually realized 
is proportional to the time the ribosome waits for a tRNA to 
occupy the A-site. If the tRNA is relatively abundant, little 
of the adjustment is realized; if the tRNA is rare implying a 
long pause before the A-site is occupied, more adjustment of 
the mRNA relative to the ribosome occurs. The displacement 
X, captures the position adjustment. In a recursive form, the 
model starts with the previous position, derived from the 
energy signal for all the codons up to but not including the 
current codon, and uses the new displacement value to update 
the position, or state, of the mRNA relative to the ribosome. 

In the course of developing our model, we have made sev- 
eral approximations and assumptions. One model assumption 
is that the presence of rare codons is the only factor modulating 
elongation rate. This assumption is consistent with Spirin [14] 
who asserts that the wait time due to the relative abundance 
of the tRNA can be assumed to be a dominating factor in 
inducing frameshifts. Although mRNA secondary structure is 
believed to result in ribosomal pausing, its absence from our 
model is based on the observation that a strong correlation 
has not been observed in all cases between mRNA secondary 
structure and framshifting [15]. 

A second assumption concerns the proportionality between 
frequency of tRNA isoacceptor (calculated using Equation 
(fT2] i) and actual tRNA availability. This proportionality is 
found to break down at low frequencies for genes encoding 
highly abundant proteins [10]. The codon bias in such genes 
is extreme, and this implies that the actual tRNA availability 
may be more than that estimated using our simple frequency 
calculation. This introduces a small error into the wait-time 
estimated using Equation (T3[ . However, this small error 
would not significantly impact our overall results obtained by 
assuming that the wait-time is inversely proportional to our 
estimated tRNA availability. Another approximation involves 
the calculation of species mean phase angle dgp- We have 
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used all the coding sequences annotated as "verified" in 
the GENBANK database, leading to a large variance in the 
estimate of Ogp- A more confident estimate may be obtained 
by using genes whose authenticity has a greater degree of 
certainity, such as the genes studied by Link et al [12]. 

Our model has utility as both a tool that could be used 
for sequence annotation and for its implications as to the 
mechanism of reading frame maintenance and frameshifting. 
Sequence annotation is an early objective for genome sequenc- 
ing projects. Frameshift sites are difficult to recognize [16] for 
current gene annotation programs such as GENMARK [17] 
and GLIMMER [18]. Our model implies that a free energy 
signal that is used to maintain reading frame is encoded in 
the coding regions of authentic genes. The existence of this 
signal can be visualized using either polar plots of signal phase 
and magnitude or in displacement plots. We are currently 
exploring this approach with the objective of developing an 
annotation program that can identify authentic coding regions 
and frameshift locations. 

The utility of this model from the mechanistic perspective 
is that it suggests how both reading frame maintenance and 
reading frame shifts could be encoded in mRNA sequences 
using translational speed to modulate positional accuracy. The 
model captures the idea that the instantaneous component of 
hybridization energy, (whose amount is a function of the 
mRNA sequence), is available to the ribosomal complex to 
adjust the position of the mRNA relative to the ribosomal 
decoding center by an amount that is proportional to the time 
required for a tRNA or release factor to fully occupy the A- 
site. The model implies that the codon bias of mRNAs could 
reflect the existence of a position-adjusting mechanism to 
maintain reading frame. Through codon selection, each mRNA 
sequence carries the information to fine-tune the position of 
each codon in the decoding center taking into consideration 
variable translational speed. 

One consequence of our interpretation of the functional 
significance of codon bias is that it could give insight into 
the empirically demonstrated connection between native and 
recombinant protein yields and codon bias. Using the free 
energy signal parameters as indicators of elongation accuracy, 
one way to think about our model is that it yields a qualitative 
estimate of the frameshift tendency within a coding sequence. 
To the degree that protein yield losses are determined by 
elongation errors, such as incorrect recruitment of tRNA, our 
model can show where such errors are most likely to occur 
in the coding sequence. Our model can also determine which 
possible sequence modifications would reduce the likelihood 
of such errors. By fitting a likelihood function to the displace- 
ment data Xfe, we could quantify the "correctness" of a coding 
sequence for translation. These predictions would then need 
to be experimentally tested. 

Our model also illustrates the value of applying engineering 
concepts to biological systems. The translation process oper- 
ates with high reliability in potentially variable environments. 
As such, it can be considered a dynamic process in which the 
existence of a control system for reading frame maintenance is 
a reasonable engineering assumption. Mathematical modeling 
of control systems for dynamic processes has been the subject 



of considerable research [19]. Signal processing techniques 
have been used with considerable success to estimate the 
various states of a dynamic process using noisy measure- 
ments. The Kalman filter [20][21] is one of the most useful 
control system models. This filter uses recursive updating of 
the process state based on discrete sampling of input signal 
information. One example application is maintaining a ship's 
geographical position despite drift, a problem that bears some 
similarity to the problem faced by the ribosomal complex in 
maintaining reading frame. 

Each cycle of translation elongation requires the ribosomal 
complex to return to the same "position", i.e., the positioning 
of the tRNA carrying the nascent polypeptide chain in the P- 
site. The precision of this position is critical as the P-site tRNA 
spatially defines the A-site boundary in the ribosomal complex 
[22]. The translational process must accomplish precise posi- 
tioning of the P-site tRNA in the face of considerable process 
variation, including potentially changing environmental con- 
ditions of salt concentration, temperature, pH, and variable 
process components such as tRNAs and mRNA sequences. 
The requirement for the ribosomal complex to return to posi- 
tion in the face of environmental perturbations is analogous 
to the drift problem encountered in the ship example. In 
our model the equation for calculating instantaneous phase 
(Equation ( fTsT i) is analogous to the measurement equation of 
a Kalman filter, and the recursive relation (Equation (fTTT i) is 
analogous to its state update equation. We have identified two 
states X = and x ~ 2 corresponding to reading frames 
and H-1, respectively. The ribosome-mRNA system is shown 
to be stable in each of these two states, i.e., small perturbations 
to the state Xk arising from minor signal deviations will die 
out eventually. Our algorithm lays the ground work for using 
adaptive filtering techniques to detect frameshifts in coding 
sequences. The logical next step is to design an algorithm that 
describes the transition into the -1 frame, and thereby develop 
a generalized model of reading frame maintenance in bacteria. 
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Appendix 

A. Selected eubacteria 

A set of 12 eubacteria (apart from E. coli shown in the 
paper) have been selected for analysis, based on the following 
factors: 

• Matching of accession number between RECODE 
(http : / /recode . genetics . utah . edu/l and 

GENBANK ( |littp : / /www . ncbi . nlm . nili . gov/genomes/lproks . cgi] ) 

• Availability of a consensus sequence for the last 13 bases 
of the 16S rRNA, also referred to as the 16S tail 

For each species, Table HIl indicates 

• its name 

• its GENBANK accession number 

• the 13 base-long 16S tail 

• the GC-content of the species, expressed as a percentage 

• the mean species phase angle, Osp, in degrees 

• the value of the parameter C, as defined in the model 

• the number of the codon at which frameshift (FS) occurs, 
according to the RECODE database (following the con- 
vention that the first codon in the sequence, i.e. the start 
codon is numbered 1) 



Name 


Genhank Acc 


J6S tail 


(G+Cj 




C 


FS codon 


Borrelia burgdorferi 


NC.{)()1318 


uuuccuccacuag 


28.2 


-63 


0.005 


20 


Bacillus halodurans 


NC.0()257() 


uuuccuccacuag 


43.7 


-23 


0.005 


25 


Bacillus subtilis 


NC.00()964 


uuuccuccacuag 


43.5 


-24 


0.01 


25 


Chlamydia muridarum 


NC.00262() 


uuuccuccacuag 


40.3 


-55 


0.005 


24 


Chlamydophila pneumoniae 


NC.000922 


uuuccuccacuag 


40.6 


-54 


0.005 


24 


Chlamydia trachomatis 


NC.000117 


uuuccuccacuag 


41.3 


-55 


0.005 


24 


Haemophilus influenzae 


NC.000907 


auuccuccacuag 


38.1 


-58 


0.005 


26 


Pasteurella multocida 


NC.002663 


auuccuccacuag 


40.4 


-48 


0.01 


26 


Streptococcus mutans 


NC.004350 


uuuccuccacuag 


36.8 


-57 


0.005 


28 


Salmonella typhimurium 


NC.0()3197 


auuccuccacuag 


52.2 


3 


0.005 


26 


Treponema pallidum 


NC.{)()0919 


uuuccuccacuag 


52.8 


-8 


0.005 


25 


Xylella fastidiosa 


NC.0()2488 


uuuccuccacuag 


52.6 


-15 


0.005 


26 



TABLE II 
Table of selected eubacteria 
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B. +1 Frameshift genes 
Borrelia burgdorferi 



i 




Fig. 9. Polar plot 
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Fig. 10. Displacement plot 




12. Displacement plot 



Bacillus subtilis 




14. Displacement plot 



Chlamydia muridarum 




16. Displacement plot 



Chlamydophila pneumoniae 




Fig. 17. Polar plot 
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18. Displacement plot 




Fig. 20. Displacement plot 
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Fig. 21. Polar plot 



1 




150 200 
Codon number k 



Fig. 22. Displacement plot 



Pasteurella multocida 




24. Displacement plot 



Streptococcus mutans 





. 26. Displacement plot 



Treponema pallidum 




30. Displacement plot 



Xylella fastidiosa 




Fig. 31. Polar plot 
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32. Displacement plot 




Fig. 33. Polar plot 
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Fig. 34. Displacement plot 
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36. Displacement plot 



manX 




38. Displacement plot 



mglB 




40. Displacement plot 



osmC 




42. Displacement plot 



rbsB 





44. Displacement plot 




Fig. 46. Displacement plot 




48. Displacement plot 




Fig. 50. Displacement plot 



sdhA 




Fig. 51. Polar plot 
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52. Displacement plot 



